Skip to content

feat: use NucleotidePosition value object, add 0-based coordinate conversion#85

Open
KingKong1213 wants to merge 3 commits intomasterfrom
feat_use-nucleotide-position
Open

feat: use NucleotidePosition value object, add 0-based coordinate conversion#85
KingKong1213 wants to merge 3 commits intomasterfrom
feat_use-nucleotide-position

Conversation

@KingKong1213
Copy link
Contributor

@KingKong1213 KingKong1213 commented Mar 18, 2026

Introduces NucleotidePosition value object (same pattern as Chromosome) to centralize position validation. Constructors of GenomicPosition and GenomicRegion now accept NucleotidePosition instead of int.

Adds GenomicRegion::fromZeroBasedHalfOpen() / toZeroBasedHalfOpen() for BED/BAM/bigWig interoperability. Internally everything stays 1-based closed.

  • Added automated tests
  • Documented for all relevant versions

Changes

  • NucleotidePosition: value object wrapping SafeCast::toInt() + validation (>= 1)
  • GenomicPosition: constructor accepts NucleotidePosition instead of int
  • GenomicRegion: constructor accepts NucleotidePosition instead of int
  • GenomicRegion::fromZeroBasedHalfOpen(): factory for 0-based half-open coordinates (BED, BAM, bigWig)
  • GenomicRegion::toZeroBasedHalfOpen(): serialize back to 0-based half-open

Breaking changes

None. GenomicPosition and GenomicRegion were introduced in v6.2.0 (2 weeks ago) and are not yet widely adopted.

@KingKong1213 KingKong1213 requested a review from simbig March 18, 2026 09:55
Copy link
Contributor

@simbig simbig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guter PR — Value Object für die Position ist genau richtig, gleiche Idee wie Chromosome. Würde ich gern approven, aber die Validierungsgrenze hat sich von < 1 auf < 0 verschoben und das ändert leider das Koordinatensystem. length() rechnet end - start + 1 (1-based closed), Position 0 führt da zu off-by-one. Details + Lösungsvorschlag im Inline-Kommentar.

public function __construct($positionAsMixed)
{
$position = SafeCast::toInt($positionAsMixed);
if ($position < 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Die Grenze hat sich hier von < 1 (in GenomicPosition/GenomicRegion vorher) auf < 0 verschoben. Damit wird Position 0 valide, aber das bricht die 1-based Annahme die durch den Rest vom Code geht:

// length() rechnet 1-based closed: end - start + 1
$region = new GenomicRegion(new Chromosome('chr7'), new NucleotidePosition(0), new NucleotidePosition(100));
$region->length(); // → 101, aber 0-based half-open wären 100 Basen

Auch parse() unterstützt g.-Prefix (HGVS) — da ist g.0 per Definition ungültig.

Falls BED-Support der Treiber ist: BED ist ein I/O-Format, kein internes Koordinatensystem. Am saubersten als Konvertierung an der Grenze — gleicher Ansatz wie htsjdk/pysam:

Suggested change
if ($position < 0) {
if ($position < 1) {

BED-Support z.B. als Factory auf GenomicRegion:

public static function fromBed(string $chromosome, int $start, int $end): self
{
    // BED 0-based half-open → 1-based closed
    return new self(
        new Chromosome($chromosome),
        new NucleotidePosition($start + 1),
        new NucleotidePosition($end),
    );
}

Testcase dazu:

public function testFromBedConvertsToOneBased(): void
{
    // BED: chr7  55249070  55249171 (EGFR Exon 19, 0-based half-open)
    $region = GenomicRegion::fromBed('chr7', 55249070, 55249171);

    // Intern 1-based closed: chr7:55249071-55249171
    self::assertSame(55249071, $region->start);
    self::assertSame(55249171, $region->end);
    self::assertSame(101, $region->length());
}

NucleotidePosition must reject position 0 to stay consistent with
the 1-based closed coordinate system used by length(), containsPosition(),
intersection() and HGVS g. notation.

Add GenomicRegion::fromZeroBasedHalfOpen() and toZeroBasedHalfOpen()
for BED/BAM/bigWig interoperability: convert at the I/O boundary,
work internally in 1-based closed coordinates.

🤖 Generated with Claude Code
@simbig simbig changed the title Feat: Use nucleotide position instead of integer. Therefore, the inte… feat: use NucleotidePosition value object, add 0-based coordinate conversion Mar 18, 2026
Copy link
Contributor

@simbig simbig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixes are in: NucleotidePosition enforces >= 1, fromZeroBasedHalfOpen()/toZeroBasedHalfOpen() für BED/BAM/bigWig. Tests grün, PHPStan sauber.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants